knitr::opts_chunk$set(echo = TRUE)

Programas en Ciencia de Datos

En este trabajo hago un anĂ¡lisis exploratorio en R de una tabla que contiene datos de programas en DS (Ciencia de Datos) en USA.

La descarga de la tabla timesMergedData.csv se hace aquĂ­: https://www.kaggle.com/sriharirao/datascience-universities-across-us/data.

Metadata de columnas

Primero cargo timesMergedData.csv en dsp.

dsp <- read.csv('timesMergedData.csv')

El data frame contiene dim(dsp) renglones y columnas.

dim(dsp)
## [1] 954  27

La metadata de la tabla no se encuentra disponible en la web. Por eso, en la tabla abajo incluyo una descripciĂ³n del contenido de las columnas y uso signo de interrogaciĂ³n ? cuando no encontrĂ© sentido en el contenido de alguna columna.

Columna Tipo DescripciĂ³n
SCHOOL String Escuela
STATE String Estado
CITY String Ciudad
NOC Numeric ?
PROGRAM String Programa
TYPE String : ‘C’ - Certificate, ‘M’ - Master
DEPARTMENT String Departamento
DELIVERY String Campus, en linea o hibrido
DURATION String Duracion
PREREQ String Prerequisitos
LINK String Link
LOC_LAT Numeric Latitud
LOC_LONG Numeric Longitud
WORLD_RANK Numeric Ranking Mundial
COUNTRY String USA
TEACHING Numeric ?
INTERNATIONAL Numeric ?
RESEARCH Numeric ?
CITATIONS Numeric ?
INCOME Numeric ?
TOTAL_SCORE Numeric ?
NUM_STUDENTS Numeric No. de estudiantes
STUDENT_STAFF_RATIO Numeric ?
INTERNATIONAL_STUDENTS String Porcentaje de estudiantes extranjeros
F_M_RATIO String ?
YEAR Numeric año
timesData Numeric ?

Uso str(dsp) para ver la estructura del data frame.

str(dsp)
## 'data.frame':    954 obs. of  27 variables:
##  $ SCHOOL                : Factor w/ 219 levels "Albright College",..: 1 2 3 3 4 4 4 4 4 4 ...
##  $ STATE                 : Factor w/ 40 levels "Alabama","Arizona",..: 32 5 7 7 2 2 2 2 2 2 ...
##  $ CITY                  : Factor w/ 173 levels "Adelphi","Albuquerque",..: 127 11 164 164 155 155 155 155 155 155 ...
##  $ NOC                   : int  1 1 2 2 1 1 1 1 1 1 ...
##  $ PROGRAM               : Factor w/ 312 levels "Advanced Certificate in Applied Statistics",..: 91 176 274 179 198 198 198 198 198 198 ...
##  $ TYPE                  : Factor w/ 2 levels "C","M": 2 2 2 2 2 2 2 2 2 2 ...
##  $ DEPARTMENT            : Factor w/ 274 levels "Adult and Graduate Studies",..: 113 190 161 161 269 269 269 269 269 269 ...
##  $ DELIVERY              : Factor w/ 14 levels "Blended","Campus",..: 9 9 9 9 11 11 11 11 11 11 ...
##  $ DURATION              : Factor w/ 189 levels "1 Year","1 semester",..: 70 114 35 90 181 181 181 181 181 181 ...
##  $ PREREQ                : Factor w/ 31 levels "Not Available",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LINK                  : Factor w/ 372 levels "dead link - program appears to no longer be offered",..: 153 155 337 154 339 339 339 339 339 339 ...
##  $ LOC_LAT               : num  40.4 39.7 38.9 38.9 33.4 ...
##  $ LOC_LONG              : num  -75.9 -104.8 -77.1 -77.1 -111.9 ...
##  $ WORLD_RANK            : Factor w/ 138 levels "1","10","102",..: NA NA 92 92 30 38 32 19 53 56 ...
##  $ COUNTRY               : Factor w/ 1 level "United States of America": NA NA 1 1 1 1 1 1 1 1 ...
##  $ TEACHING              : num  NA NA 42.2 42.2 33.8 43 38.4 38.2 35.7 32.4 ...
##  $ INTERNATIONAL         : num  NA NA 28.9 28.9 28.6 24.1 27.4 26.1 29.5 31.9 ...
##  $ RESEARCH              : num  NA NA 16.5 16.5 35.9 44.1 45.2 39 37.5 38.1 ...
##  $ CITATIONS             : num  NA NA 41.1 41.1 83.6 66.9 79.9 80.3 73.1 84.6 ...
##  $ INCOME                : Factor w/ 143 levels "-","100","24.2",..: NA NA 58 58 32 1 34 12 39 35 ...
##  $ TOTAL_SCORE           : Factor w/ 160 levels "-","44.8","45",..: NA NA 1 1 22 30 34 27 14 26 ...
##  $ NUM_STUDENTS          : Factor w/ 58 levels "10,646","10,788",..: NA NA 4 4 56 56 56 56 56 56 ...
##  $ STUDENT_STAFF_RATIO   : num  NA NA 12 12 29.9 29.9 29.9 29.9 29.9 29.9 ...
##  $ INTERNATIONAL_STUDENTS: Factor w/ 25 levels "10%","11%","12%",..: NA NA 3 3 25 25 25 25 25 25 ...
##  $ F_M_RATIO             : Factor w/ 27 levels "","1.011805556",..: NA NA 25 25 15 15 15 15 15 15 ...
##  $ YEAR                  : int  NA NA 2016 2016 2014 2011 2013 2012 2015 2016 ...
##  $ timesData             : int  0 0 1 1 1 1 1 1 1 1 ...

Elimino las columnas que no voy a utilizar.

dsp <- dsp[, c('SCHOOL',
               'STATE',
               'CITY',
               'PROGRAM',
               'TYPE',
               'DEPARTMENT',
               'DELIVERY',
               'LINK',
               'LOC_LAT',
               'LOC_LONG',
               'NUM_STUDENTS',
               'INTERNATIONAL_STUDENTS',
               'YEAR')]

AnĂ¡lisis Exploratorio

Las columnas en dsp contienen bĂ¡sicamente variables categĂ³ricas, por eso para el ‘data vis’ que muestro abajo utilizo muchas grĂ¡ficas de barras. La liga http://uc-r.github.io/barcharts es una excelente fuente para producir distintos tipos de grĂ¡ficas de barras en R usando dplyr y ggplot2.

Para los mapas utilizo las paqueterĂ­as leaflet y rgdal. La liga que contiene el shape file .shp con los estados de USA es esta https://www.census.gov/geo/maps-data/data/cbf/cbf_state.html.

Primero cargo las paqueterĂ­as necesarias.

require(magrittr, quietly = TRUE, warn.conflicts = FALSE)
require(dplyr, quietly = TRUE, warn.conflicts = FALSE)
require(tidyr, quietly = TRUE, warn.conflicts = FALSE)
require(ggplot2, quietly = TRUE, warn.conflicts = FALSE)
require(leaflet, quietly = TRUE, warn.conflicts = FALSE)
require(rgdal, quietly = TRUE, warn.conflicts = FALSE)
## rgdal: version: 1.2-8, (SVN revision 663)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.2, released 2016/10/24
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rgdal/gdal
##  Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rgdal/proj
##  Linking to sp version: 1.2-5
require(ngram, quietly = TRUE, warn.conflicts = FALSE)
require(rgdal, quietly = TRUE, warn.conflicts = FALSE)
require(leaflet, quietly = TRUE, warn.conflicts = FALSE)

YEAR

CuĂ¡ntas registros hay para cada año YEAR?

summary(as.factor(dsp$YEAR))
## 2011 2012 2013 2014 2015 2016 NA's 
##   87  114  110  112  115  134  282

Hacia adelante utilizo solo: 2016’s y NA’s.

dsp %<>% filter(YEAR == 2016 | is.na(YEAR))

Me quedo con observaciones unicas

dsp %<>% distinct() 

DELIVERY

Las categorĂ­as de formato DELIVERY son:

#summary(dsp$DELIVERY)
dsp %>% group_by(DELIVERY) %>% summarize(n=n())
## # A tibble: 14 x 2
##                                     DELIVERY     n
##                                       <fctr> <int>
##  1                                   Blended     1
##  2                                    Campus   246
##  3                         Campus and online     1
##  4                          Campus or Online    12
##  5                          Campus or online     3
##  6                            Campus, Online     1
##  7                                    Hybrid     8
##  8                                 On Campus     1
##  9                                    Online   134
## 10 Online (one Saturday per month on-campus)     1
## 11                          Online or Campus     2
## 12                       Online or On Campus     3
## 13                          Online or campus     1
## 14                 Online, campus, or hybrid     1

Limpio las categorĂ­as innecesarias en DELIVERY y creo una nueva columna DELIVERY2.

dsp %<>% 
  mutate(DELIVERY2 = recode(DELIVERY, 
                            'Blended' = 'Hybrid',
                            'Campus and online' = 'Hybrid',
                            'Campus or Online' = 'Campus or online',
                            'Campus, Online' = 'Campus or online',
                            'On Campus' = 'Campus',
                            'Online (one Saturday per month on-campus)' = 'Hybrid',
                            'Online or Campus' = 'Campus or online',
                            'Online or On Campus' = 'Campus or online',
                            'Online or campus' = 'Campus or online',
                            'Online, campus, or hybrid' = 'Campus or online'))

Las categorĂ­as de DELIVERY2 son:

dsp %>% group_by(DELIVERY2) %>% summarize(n=n())
## # A tibble: 4 x 2
##          DELIVERY2     n
##             <fctr> <int>
## 1           Hybrid    11
## 2           Campus   247
## 3 Campus or online    23
## 4           Online   134
# uso x = 'FORMATO'
ggplot(dsp, aes(x = 'FORMATO', fill = DELIVERY2)) + 
  geom_bar(position = position_stack(), colour = 'grey', alpha = 0.7, width = .5) +
  labs(title = "numero de programas DS por DELIVERY2", x = "", y = "")

SCHOOL

CuĂ¡l escuela SCHOOL tiene mĂ¡s programas?

dsp %>% group_by(SCHOOL, STATE) %>% tally() %>% arrange(desc(n)) %>% filter(n > 2)
## # A tibble: 53 x 3
## # Groups:   SCHOOL [53]
##                            SCHOOL                STATE     n
##                            <fctr>               <fctr> <int>
##  1             Bentley University        Massachusetts     9
##  2            New York University             New York     9
##  3              Boston University        Massachusetts     6
##  4 Indiana University Bloomington              Indiana     6
##  5            Stanford University           California     6
##  6   George Washington University District of Columbia     5
##  7       Johns Hopkins University             Maryland     5
##  8        Northeastern University        Massachusetts     5
##  9        Northwestern University             Illinois     5
## 10             Rutgers University           New Jersey     5
## # ... with 43 more rows
#geom_col() en lugar de geom_bar(stats = 'identity') 
ggplot(dsp %>% 
         group_by(SCHOOL, STATE) %>% 
         tally() %>% 
         arrange(desc(n)) %>% 
         filter(n > 2), 
       aes(x = reorder(SCHOOL, n), y = n, fill = n)) + 
  geom_col(colour = 'grey', alpha = 0.7) +
  scale_fill_gradient(high = 'orchid4', low = 'orchid') +
  coord_flip() +
  labs(title = "numero de programas DS por SCHOOL", x = "SCHOOL", y = "count") +
  theme(legend.position = 'none') + 
  scale_y_continuous(breaks = 0:15)

Veo las escuelas con mayor nĂºmero de programas y su forma de impartirse DELIVERY2.

ggplot(dsp %>% 
         group_by(SCHOOL, STATE, DELIVERY2) %>% 
         tally() %>% 
         arrange(desc(n)) %>% 
         filter(n > 2), 
       aes(x = reorder(SCHOOL, n), y = n, fill = DELIVERY2)) + 
  geom_col(colour = 'grey', alpha = 0.7) +
  coord_flip() +
  labs(title = "numero de programas DS por SCHOOL", x = "SCHOOL", y = "count")  + 
  scale_y_continuous(breaks = 0:15)

STATE

CuĂ¡ntos programas existen por estado STATE?

ggplot(dsp, aes(STATE)) + 
  geom_bar(fill = 'skyblue', colour = 'grey', alpha = .5) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position = 'none') +
  labs(title = "numero de programas DS por STATE", x = "STATE", y = "count")

Separo los programas por estado STATE y forma de impartirse DELIVERY.

ggplot(dsp, aes(STATE, fill = DELIVERY2)) +
  geom_bar(colour = 'grey', alpha = 0.7) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  labs(title = "numero de programas DS por STATE", x = "STATE", y = "count")

Veo en un mapa la densidad de programas que se imparten en campus.

map <- readOGR(dsn = "./cb_2016_us_state_500k", layer = "cb_2016_us_state_500k", encoding = "UTF-8")
## OGR data source with driver: ESRI Shapefile 
## Source: "./cb_2016_us_state_500k", layer: "cb_2016_us_state_500k"
## with 56 features
## It has 9 fields
## Integer64 fields read as strings:  ALAND AWATER
map <- map[!map@data$NAME %in% c('Alaska', 
                                 'Hawaii',
                                 'Puerto Rico',
                                 'Guam',
                                 'United States Virgin Islands',
                                 'Commonwealth of the Northern Mariana Islands',
                                 'American Samoa'), ]
count = dsp %>% filter(DELIVERY2 == 'Campus') %>% group_by(STATE) %>% summarise(count_STATE = n())

map@data$count_STATE = count$count_STATE[match(map@data$NAME, count$STATE)]

pal <- colorNumeric("Reds", c(0, max(map@data$count_STATE, na.rm = TRUE)))

banner <- paste("<strong>State: </strong>", 
                    map@data$NAME, 
                    "<br>DS-programs: ", 
                    map@data$count_STATE)

leaflet(data = map) %>%
  addTiles() %>%
  addPolygons(fillOpacity = 0.8,
              smoothFactor = 0.5,
              color = ~pal(count_STATE),
              popup = banner) %>% 
  addLegend("bottomright",
            values = ~count_STATE,
            pal = pal)  %>%
  addPolylines(color = "red")

CITY

QuĂ© ciudad CITY tiene mĂ¡s programas?

dsp %>% group_by(CITY, STATE) %>% tally() %>% arrange(desc(n))
## # A tibble: 180 x 3
## # Groups:   CITY [172]
##            CITY                STATE     n
##          <fctr>               <fctr> <int>
##  1     New York             New York    17
##  2       Boston        Massachusetts    13
##  3      Chicago             Illinois    12
##  4      Waltham        Massachusetts    11
##  5   Washington District of Columbia     9
##  6    Baltimore             Maryland     8
##  7       Denver             Colorado     8
##  8 Philadelphia         Pennsylvania     7
##  9    Rochester             New York     7
## 10  Bloomington              Indiana     6
## # ... with 170 more rows
ggplot(dsp %>% 
         group_by(CITY, STATE) %>% 
         tally() %>% 
         arrange(desc(n)) %>% 
         filter(n > 2), 
       aes(x = reorder(CITY, n), y = n)) + 
  geom_bar(stat = 'identity', fill = 'tomato', colour = 'grey', alpha = 0.5) +
  coord_flip() + 
  geom_text(aes(label = n), nudge_y = 1, color = 'tomato', size = 2.5) +
  labs(title = "numero de DS programas por CITY", x = "CITY", y = "count")

Veo las ciudades con mayor nĂºmero de programas y su forma de impartirse DELIVERY2.

ggplot(dsp %>% 
         group_by(CITY, STATE, DELIVERY2) %>% 
         tally() %>% 
         arrange(desc(n)) %>% 
         filter(n > 2), 
       aes(x = reorder(CITY, n), y = n, fill = DELIVERY2)) + 
  geom_bar(stat = 'identity', colour = 'grey', alpha = 0.7) +
  coord_flip() +
  labs(title = "numero de DS programas por CITY", x = "CITY", y = "count")

PROGRAM & TYPE

Las categorĂ­as de PROGRAM son:

#summary(dsp$PROGRAM)
dsp %>% group_by(PROGRAM) %>% summarize(n=n()) %>% arrange(desc(n))
## # A tibble: 311 x 2
##                                       PROGRAM     n
##                                        <fctr> <int>
##  1    Master of Science in Business Analytics    24
##  2             Master of Science in Analytics    10
##  3 Graduate Certificate in Business Analytics     9
##  4          Master of Science in Data Science     8
##  5    Master of Science in Applied Statistics     7
##  6   Master of Science in Information Systems     7
##  7        Master of Science in Data Analytics     5
##  8    Master of Science in Health Informatics     4
##  9   Online Master of Science in Data Science     4
## 10                Certificate in Data Science     3
## # ... with 301 more rows

Creo una nueva columna PROGRAM2 usando palabras clave en PROGRAM.

# \\b significa 'word boundary'
PROGRAM2 <- as.character(dsp$PROGRAM)
PROGRAM2 %<>% 
  tolower() %>%  
  gsub('\\.|\\:|\\,', '', .) %>%
  gsub('\\&', 'and', .) %>%
  gsub('\\(|\\)', '', .) %>%
  
  gsub('master\'s', 'master', .) %>% 
  gsub('masters', 'master', .) %>%
  gsub('\\bms\\b', '*M* *MS* *Sc*', .) %>%
  gsub('master of science', '*M* *MS* *Sc*', .) %>%
  gsub('\\bmba\\b', '*M* *MBA* *B*', .) %>%
  gsub('master of business administration', '*M* *MBA* *B*', .) %>%
  gsub('master of business and science', '*M* *MBS* *B* *Sc*', .) %>%
  gsub('master', '*M*', .) %>%
  
  gsub('diploma', '*Cert*', .) %>%
  gsub('certificate', '*Cert*', .) %>%
  
  gsub('doctor', '*PhD*', .) %>%
  gsub('phd', '*PhD*', .) %>%
  
  gsub('\\bds\\b', '*DS*', .) %>%
  gsub('computational data science', '*DS* *CS* *CDS*', .) %>%
  gsub('computational and data science', '*DS* *CS* *CDS*', .) %>%
  gsub('data science', '*DS*', .) %>%
  gsub('computer science', '*CS*', .) %>%
  gsub('computational science', '*CS*', .) %>%
  
  gsub('business analytics', '*BI* *B* *Analytics*', .) %>%
  gsub('\\bbi\\b', '*BI* *B* *Analytics*', .) %>%
  gsub('business intelligence', '*BI* *B* *Analytics*', .) %>%
  
  gsub('data mining', 'mining', .) %>%
  gsub('mining', '*Analytics*', .) %>%
  
  gsub('applied statistics', 'statistics', .) %>%
  gsub('statistical', 'statistics', .) %>%
  gsub('statistics', '*Stats*', .) %>%
  
  gsub('informatics', 'analytics', .) %>%
  gsub('data analytics', 'analytics', .) %>%
  gsub('analytics', '*Analytics*', .) %>%
  
  gsub('information systems technology', '*IS* *IT*', .) %>%
  gsub('management information systems', '*IM* *IS*', .) %>%
  gsub('information management', 'im', .) %>%
  gsub('information systems', 'is', .) %>%
  gsub('\\bim\\b', '*IM*', .) %>%
  gsub('\\bis\\b', '*IS*', .) %>%
  
  
  gsub('information technology', 'it', .) %>%
  gsub('\\bit\\b', '*IT*', .) %>%
  
  gsub('health', '*Health-Bio*', .) %>%
  gsub('bio', '*Health-Bio*', .) %>%
  gsub('urban', '*Urban*', .) %>%
  gsub('public', '*Public*', .) %>% 
  
  gsub('\\bin\\b|\\band\\b|\\bof\\b|\\bwith\\b|\\ba\\b|\\bfor\\b|\\bthe\\b|\\bat\\b', '', .) %>% 
  gsub('^a |^the ', '', .)

dsp$PROGRAM2 <- PROGRAM2

words <- dsp %>% select(PROGRAM, PROGRAM2)
write.csv(words, 'words.csv')

#words <- unlist(strsplit(PROGRAM2," "))
#words <- as.data.frame(table(words)) %>% arrange(desc(Freq))
#write.csv(words, 'words.csv')

De quĂ© tipo TYPE son los programas? CuĂ¡ntos programas son 'C' Certificates y cuĂ¡ntos son 'M' Masters?

ggplot(dsp, aes(TYPE, fill = TYPE)) +
  geom_bar() +
  geom_text(stat = 'count', aes(label = ..count.., y = ..count..), vjust = 1.5)

Genero una nueva columna TYPE2

dsp$TYPE2 <- NA
dsp$PROG <- NA
dsp$TYPE2[grep("*M*", dsp$PROGRAM2)] <- 'M'
dsp$PROG[grep("*MBA*", dsp$PROGRAM2)] <- 'MBA'
dsp$PROG[grep("MBS", dsp$PROGRAM2)] <- 'MBS'
dsp$PROG[grep("MS", dsp$PROGRAM2)] <- 'MS'
dsp[grep("*PhD*", dsp$PROGRAM2), c('TYPE2','PROG')] <- 'PhD'
dsp[grep("*Cert*", dsp$PROGRAM2), c('TYPE2','PROG')] <- 'Cert'

dsp %>% group_by(TYPE2, PROG) %>% tally()
## # A tibble: 6 x 3
## # Groups:   TYPE2 [?]
##   TYPE2  PROG     n
##   <chr> <chr> <int>
## 1  Cert  Cert    99
## 2     M   MBA    31
## 3     M   MBS     2
## 4     M    MS   219
## 5     M  <NA>    53
## 6   PhD   PhD    11
dsp[is.na(dsp$PROG),c('PROGRAM','PROGRAM2')]
##                                                                                            PROGRAM
## 9                                    Professional Science Master's in Data Management and Analysis
## 10                                    Professional Science Master's Degree in Predictive Analytics
## 15            Master of Professional Science in Technology Innovation with Focus in Bioinformatics
## 22                                                                    Master of Business Analytics
## 24                                                               Masters of Information Technology
## 35                                                     Master of Arts in Computational Linguistics
## 48  Master of Information Systems Management, Business Intelligence and Data Analytics (MISM-BIDA)
## 49                                                     Master of Computational Data Science (MCDS)
## 50                                                                        MSM - Business Analytics
## 62                                                           Master of Applied Statistics (M.A.S.)
## 68            Master of Professional Studies (MPS) in Applied Statistics (Option II: Data Science)
## 71                                                  MA in Data Analytics & Applied Social Research
## 76                                                           Online Master's in Health Informatics
## 79                                                               Master of Quantitative Management
## 80                                                            Master of Arts in Applied Statistics
## 82                                                                    Master in Health Informatics
## 87                                                                      Master's in Data Analytics
## 92                                                                  Masters in Information Systems
## 112                                                                         Master of Data Science
## 141                                                                   Master of Business Analytics
## 149                                                            Health Informatics Master's Program
## 153                                                                         Master in Data Science
## 168                                                  Master of Professional Studies in Informatics
## 183                                                                   Master of Applied Statistics
## 184                                              Master of Public Health in Biomedical Informatics
## 193                                               Master of Professional Studies in Data Analytics
## 194                                               Master of Professional Studies in Data Analytics
## 195                                                                   Master of Applied Statistics
## 215                                                                          Master of Information
## 220                           Online Master's in Health Administration: Informatics Specialization
## 221                                                              Applied Analytics Master's Degree
## 254                              Professional Science Master's Degree in Environmental Informatics
## 266                                                                   Master of Business Analytics
## 284                                              Online Master's in Management Information Systems
## 285                                                     Master's in Management Information Systems
## 287                                                     Professional Master of Information Systems
## 288                            Master of Information Systems with Business Analytics Concentration
## 289                                           Online Master of Information and Data Science (MIDS)
## 290                                Master of Engineering - Concentration in Data Science & Systems
## 291                                                   Master of Information Management and Systems
## 294                                                                 Master's of Health Informatics
## 300                                       Master of Advanced Study in Data Science and Engineering
## 304                               Professional Science Master's Program in Health Care Informatics
## 327                                                     Master of Computer Science in Data Science
## 336                                                               Master of Information Management
## 345                                                             Master's Program in Bioinformatics
## 360              Professional Science Master's (PSM) in Data Science and Business Analytics (DSBA)
## 368                                                           Master of Data Science and Analytics
## 371                                                               Master's Degree in Biostatistics
## 372                                                      Master's Degree in Biomedical Informatics
## 374                                                                      MSIS - Big Data Analytics
## 380                                                                   Master of Applied Statistics
## 410                                Master of Professional Studies in Statistical and Data Sciences
##                                                                             PROGRAM2
## 9                                professional science *M*  data management  analysis
## 10                           professional science *M* degree  predictive *Analytics*
## 15  *M*  professional science  technology innovation  focus  *Health-Bio**Analytics*
## 22                                                         *M*  *BI* *B* *Analytics*
## 24                                                                         *M*  *IT*
## 35                                              *M*  arts  computational linguistics
## 48                  *M*  *IS* management *BI* *B* *Analytics*  *Analytics* mism-bida
## 49                                                         *M*  *DS* *CS* *CDS* mcds
## 50                                                        msm - *BI* *B* *Analytics*
## 62                                                                  *M*  *Stats* mas
## 68                             *M*  professional studies mps  *Stats* option ii *DS*
## 71                                          ma  *Analytics*  applied social research
## 76                                              online *M*  *Health-Bio* *Analytics*
## 79                                                      *M*  quantitative management
## 80                                                                *M*  arts  *Stats*
## 82                                                     *M*  *Health-Bio* *Analytics*
## 87                                                                  *M*  *Analytics*
## 92                                                                         *M*  *IS*
## 112                                                                        *M*  *DS*
## 141                                                        *M*  *BI* *B* *Analytics*
## 149                                             *Health-Bio* *Analytics* *M* program
## 153                                                                        *M*  *DS*
## 168                                           *M*  professional studies  *Analytics*
## 183                                                                     *M*  *Stats*
## 184                      *M*  *Public* *Health-Bio*  *Health-Bio*medical *Analytics*
## 193                                           *M*  professional studies  *Analytics*
## 194                                           *M*  professional studies  *Analytics*
## 195                                                                     *M*  *Stats*
## 215                                                                 *M*  information
## 220               online *M*  *Health-Bio* administration *Analytics* specialization
## 221                                                   applied *Analytics* *M* degree
## 254                       professional science *M* degree  environmental *Analytics*
## 266                                                        *M*  *BI* *B* *Analytics*
## 284                                                            online *M*  *IM* *IS*
## 285                                                                   *M*  *IM* *IS*
## 287                                                           professional *M*  *IS*
## 288                                    *M*  *IS*  *BI* *B* *Analytics* concentration
## 289                                               online *M*  information  *DS* mids
## 290                                  *M*  engineering - concentration  *DS*  systems
## 291                                                               *M*  *IM*  systems
## 294                                                    *M*  *Health-Bio* *Analytics*
## 300                                           *M*  advanced study  *DS*  engineering
## 304                  professional science *M* program  *Health-Bio* care *Analytics*
## 327                                                                  *M*  *CS*  *DS*
## 336                                                                        *M*  *IM*
## 345                                             *M* program  *Health-Bio**Analytics*
## 360                    professional science *M* psm  *DS*  *BI* *B* *Analytics* dsba
## 368                                                           *M*  *DS*  *Analytics*
## 371                                                  *M* degree  *Health-Bio**Stats*
## 372                                      *M* degree  *Health-Bio*medical *Analytics*
## 374                                                           msis - big *Analytics*
## 380                                                                     *M*  *Stats*
## 410                                        *M*  professional studies  *Stats*  *DS*s

Veo qué Doctorados hay:

dsp %>% filter(TYPE2 == 'PhD')
##                                     SCHOOL         STATE             CITY
## 1                       Chapman University    California           Orange
## 2            Colorado Technical University      Colorado Colorado Springs
## 3           Indiana University Bloomington       Indiana      Bloomington
## 4                Kennesaw State University       Georgia         Kennesaw
## 5                      New York University      New York         New York
## 6                 University of Cincinnati          Ohio       Cincinnati
## 7      University of Maryland-College Park      Maryland     College Park
## 8       University of Massachusetts-Boston Massachusetts           Boston
## 9        University of Southern California    California      Los Angeles
## 10 University of Washington-Seattle Campus    Washington          Seattle
## 11         Worcester Polytechnic Institute Massachusetts        Worcester
##                                                                                   PROGRAM
## 1                                            Doctorate in Computational and Data Sciences
## 2                        Doctor of Computer Science - Concentration in Big Data Analytics
## 3                                                             Ph.D. Minor in Data Science
## 4                                                     Ph.D. in Analytics and Data Science
## 5  Ph.D. in Computer Science with Specialization in Visualization, Databases and Big Data
## 6                                   Doctor of Philosopy in Biostatistics - Big Data Track
## 7                   Ph.D. in Information Studies - Concentration in Big Data/Data Science
## 8           Ph.D. in Business Administration - Information Systems for Data Science Track
## 9                                                     Ph.D. in Data Sciences & Operations
## 10                                                     Ph.D. in Big Data and Data Science
## 11                                                                  Ph.D. in Data Science
##    TYPE                             DEPARTMENT         DELIVERY
## 1     M Schmid College of Science & Technology           Campus
## 2     M            Computer Science Department           Online
## 3     M    School of Informatics and Computing Campus or Online
## 4     M       College of Science & Mathematics           Campus
## 5     M           Tandon School of Engineering           Campus
## 6     M                    College Of Medicine           Campus
## 7     M         College of Information Studies           Campus
## 8     M                  College of Management           Campus
## 9     M            Marshall School of Business           Campus
## 10    M                     eScience Institute           Campus
## 11    M             College of Arts & Sciences           Campus
##                                                                                  LINK
## 1                 http://www.chapman.edu/scst/graduate/phd-computational-science.aspx
## 2  http://www.coloradotech.edu/degrees/doctorates/computer-science/big-data-analytics
## 3       http://www.soic.indiana.edu/graduate/degrees/data-science/graduate/index.html
## 4                   https://analytics.kennesaw.edu/academics/grad/MSAS/msas-curr.html
## 5                         http://steinhardt.nyu.edu/graduate_admissions/guide/assr/ms
## 6                 https://eh.uc.edu/bio/academic-programs/phd-biostatistics/big-data/
## 7                                                 http://ischool.umd.edu/tuition-fees
## 8        https://www.umb.edu/academics/caps/certificates/business_analytics/admission
## 9                                http://www.marshall.usc.edu/msanalytics/program_cost
## 10                               http://www.pce.uw.edu/certificates/data-science.html
## 11                  http://www.wpi.edu/academics/datascience/certificate-program.html
##    LOC_LAT  LOC_LONG NUM_STUDENTS INTERNATIONAL_STUDENTS YEAR
## 1  33.7937 -117.8510         <NA>                   <NA>   NA
## 2  38.8937 -104.8340         <NA>                   <NA>   NA
## 3  39.1664  -86.5269         <NA>                   <NA>   NA
## 4  34.0363  -84.5808         <NA>                   <NA>   NA
## 5  40.7295  -73.9973       42,056                    19% 2016
## 6  39.1312  -84.5143       36,108                     6% 2016
## 7  38.9886  -76.9397         <NA>                   <NA>   NA
## 8  42.3145  -71.0387         <NA>                   <NA>   NA
## 9  34.0211 -118.2840       36,534                    20% 2016
## 10 47.6562 -122.3130         <NA>                   <NA>   NA
## 11 42.2751  -71.8088         <NA>                   <NA>   NA
##           DELIVERY2
## 1            Campus
## 2            Online
## 3  Campus or online
## 4            Campus
## 5            Campus
## 6            Campus
## 7            Campus
## 8            Campus
## 9            Campus
## 10           Campus
## 11           Campus
##                                                          PROGRAM2 TYPE2
## 1                                      *PhD*ate  *DS* *CS* *CDS*s   PhD
## 2                    *PhD*  *CS* - concentration  big *Analytics*   PhD
## 3                                               *PhD* minor  *DS*   PhD
## 4                                        *PhD*  *Analytics*  *DS*   PhD
## 5  *PhD*  *CS*  specialization  visualization databases  big data   PhD
## 6          *PhD*  philosopy  *Health-Bio**Stats* - big data track   PhD
## 7       *PhD*  information studies - concentration  big data/*DS*   PhD
## 8               *PhD*  business administration - *IS*  *DS* track   PhD
## 9                                        *PhD*  *DS*s  operations   PhD
## 10                                          *PhD*  big data  *DS*   PhD
## 11                                                    *PhD*  *DS*   PhD
##    PROG
## 1   PhD
## 2   PhD
## 3   PhD
## 4   PhD
## 5   PhD
## 6   PhD
## 7   PhD
## 8   PhD
## 9   PhD
## 10  PhD
## 11  PhD
map2 <- dsp %>% filter(TYPE2 == 'PhD') %>% 
  select(LOC_LONG, LOC_LAT, LINK, PROGRAM, SCHOOL, CITY, STATE) %>%
  rename(long = LOC_LONG) %>%
  rename(lat = LOC_LAT)

leaflet(data = map) %>%
  addTiles() %>%
  addPolygons(fillOpacity = 0.8,
              smoothFactor = 0.5,
              color = ~pal(count_STATE)) %>% 
  addPolylines(color = "red") %>% 
  addMarkers(data = map2, ~long, ~lat, popup = ~paste(SCHOOL, PROGRAM, LINK, sep=':'), label = ~paste(CITY, STATE, sep = ","))
#leaflet(data = map2) %>% addTiles() %>%
#  addCircleMarkers(~long, ~lat, popup = ~paste(SCHOOL, PROGRAM, LINK, sep=':'), label = ~paste(CITY, STATE, sep = ","))

Veo qué maestrias en ciencias existen

map2 <- dsp %>% filter(PROG == 'MS') %>% 
  select(LOC_LONG, LOC_LAT, LINK, PROGRAM, SCHOOL, CITY, STATE) %>%
  rename(long = LOC_LONG) %>%
  rename(lat = LOC_LAT)

leaflet(data = map) %>%
  addTiles() %>%
  addPolygons(fillOpacity = 0.8,
              smoothFactor = 0.5,
              color = ~pal(count_STATE)) %>% 
  addPolylines(color = "red") %>% 
  addMarkers(data = map2, ~long, ~lat, popup = ~paste(SCHOOL, PROGRAM, LINK, sep=':'), label = ~paste(CITY, STATE, sep = ","))
#leaflet(data = map2) %>% addTiles() %>%
#  addCircleMarkers(~long, ~lat, popup = ~paste(SCHOOL, PROGRAM, LINK, sep=':'), label = ~paste(CITY, STATE, sep = ","))